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How to Build a FreeBSD Kernel 
Module From Scratch 


This workshop was designed to help you understand how the userland 
communicates with the kernel through an existing example, studying the 
workflow; hence in the end you would be able to extend it or writing one of your 
own. 


Module 1: FreeBSD Kernel Module 


In this module, we will give an overview of the nature of the FreeBSD’s kernel. 
The important configuration files will be explained in addition to learning how to 
compile the whole system with more options and with more debugging 
information enabled. This is very useful for kernel development. 


Module 2: IPFW2 Userland and Kernel Workflow 


In this module, we'll have an overview of ipfw2 - both userland and kernel side - 
and how they both interact. 


Module 3: Through The Userland to Kernel Codes 


In this module, we'll have an overview of ipfw2 - both userland and kernel side -, 
and how they interact. First of all, we will see how to use sysctl we saw in 
previous modules to set simple values. How to communicate settings to the 
kernel via a socket; all of it going through the userland to kernel codes. 


Module 4: DUMMYNET Module Workflow Study 


In this last module, we’ll not only look at ipfw’s communication with the kernel 
but also how the firewall configuration and rules are handled. We will go through 
the dummynet module, its workflow and how it operates with the kernel so you 
would be able to add new opcodes on your own. 


David Carlier 


He is an experienced developer and used to handle some languages like C/C++, 
Java, Python with Linux, *BSD and Win32 Operating Systems and worked inside 
startups and bigger companies, too. 


Personally a big fan of FreeBSD and OpenBSD. C/C++ are his preferred 
programming language most of the time. 


He writes and reviews articles for BSDMag 
http://www.bsdmag.org. 


He contributes modestly to OpenBSD ports and time in time to the source. 


He has been interviewed by BSDNow 
show http://www.bsdnow.tv/episodes/2017_ 10 18-software is storytelling. 


He did some small contributions for FreeBSD and DragonflyBSD operating 
system. 
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Module 1: FreeBSD 
Kernel Module 


In this module, we will give an overview of the nature of the FreeBSD’s kernel. The important 
configuration files will be explained in addition to learning how to compile the whole system with more 
options and with more debugging information enabled. This is very useful for kernel development. 


Requirements: 
FreeBSD 10.x. 
Machine with at least 4 cores is recommended for the system compilation. 


Genuine hardware or virtualized environment as your convenience. 
1/ The FreeBSD Kernel 


FreeBSD, like many kernels, is a monolithic kernel with loadable module support. It is possible to build 
FreeBSD kernel with all needed modules statically, or for those modules that support it, as separated 
dynamic loadable modules. 


Dynamic modules can be loaded and unloaded at will with kldload/kldunload or at boot time with 
<name of kernel module>_load=“YES” in /boot/loader.conf file. 


To have an overview of all currently loaded modules, you can type kldstat. For example, the output 
looks like the following: 


Id Refs Address Size Name 
lL 19 OxEELELLLEESOZO0O00 T9SFESTS kernel 


2d ORTEPEETEPelel 000 4re2 mg ubt.ke 


oO 1 OST EETITEP eles b000) sbabng socket. ko 


Let’s load the DTrace module by typing: 


kldload dtraceall 


Then if we type kidstat again, we should see some new entries related to this module: 


10 1 Oxff£LLLfELL8le5f000 89e dtraceall.ko 
Tl ll OxXfELELErrslesO000 9964 opensolaris.ko 


12 10 Oxffffffff8leea000 857dba dtrace.ko 


lf we add dtraceall_load=“YES” to /etc/rc.conf, we can use Dtrace framework facility after reboot. You 
can find an excellent introduction to Dtrace in the December 2016 issue of BSDmag. 


(http://osdmag.org/download/samba-nfs-and-firewall-new-bsd-issue/) 


Indeed, Dtrace can be useful for tracing syscalls. 
2/ Configuration 


To build the system, we need the source code for both the kernel and userland. The userland is simply 
all the base utilities of FreeBSD. The kernel and userland code are consistently tied together, and 
available in the same subversion repository. Apart from pure BSD code, we can find GNU libraries and 
software (called contrib code). In addition, for ZFS, CTF (Compact C Type Format debug section, 
similar to DWARF format but reduced in terms of size) and DTrace proper compilations, some CDDL 
codes are present. Happily, FreeBSD is provided with subversion in base, suffixed distinctly to avoid 
colliding with the port version. 


checkout the source in /usr/src via svnlite as a privileged user 


sudo (or as root) svnlite co https://svn0.us-east.freebsd.org/base/stable/10 /usr/src (or you can 
checkout the current branch with much newer code but with more instability, you can just replace 
stable/10 by head) 


To better understand how the kernel options work, we will have a look at the 
/usr/src/sys/conf/options file. 


Indeed, this file serves to indicate in the kernel level which symbols/constants from a certain header file 
we want to include in the build process. 


# SFreeBSDS 


+ Format-cor khis tile: 


# Option name filename 


# If filename is missing, the default is 
# opt _<name-of-option-in-lower-case>.h 
AAC DEBUG opt_aac.h 

AACRALD DEBUG Opt aderaid. i 

AHC ALLOW MEMIO opt_aic7xxx.h 
AHC_TMODE ENABLE opt_aic7xxx-h 
AHC_DUMP_EEPROM opt_aic7xxx.h 

AHC DEBUG opt _aic7xxx.h 

AHC DEBUG OPTS opt _aic7xxx.h 

AHC_REG PRETTY PRINT opt _aic7xxx.h 
ABD. DEBUG-Opt aLesoxx fi 

AND: DEBUG OPTS ‘opt. alcy9oxx.h 

AHD TMODE ENABLE Opt. ae? Ox oh 

AHD REG PRETTY PRINT opt _aic79xx.h 


ADW ALLOW MEMIO opt _adw-h 


There are up to two fields on each line: the option’s name and the file created with the relevant 
preprocessor defined. If the option is present in your kernel configuration file, let’s say 

AHD DEBUG OPTS, it is possible to test if AHD DEBUG OPTS is defined and providing some contextual 
code for this option. 


Let’s imagine we wrote a new shiny kernel module. We could add our proper line in this file. 


BSDMAG opt _bsdmag.h 


Another important file is /usr/src/sys/conf/files. This file serves to indicate which kernel 
module needs to be included in the build process. 


cam/cam.c optional scbus 

cam/cam_ compat.c optional scbus 
cam/cam periph.c optional scbus 
cam/cam_ queue.c optional scbus 


cam/cam sim.c optional scbus 


cam/cam_ xpt.c optional scbus 
cam/ata/ata_all.c optional scbus 
cam/ata/ata_xpt.c optional scbus 
cam/ata/ata_pmp.c optional scbus 
Gam/SCS1/SCS1, xpt.c optional scbus 
Gam/scsi/scsi_all.c optional scbus 
Cam/SCSL/SCs1.~Cd.c Optional. cd 
Gam/Sesi/ scsi chee optional. ich 
cam/ata/ata_da.c optional ada | da 
Cam/ Chi / Cihlsc optaonal. er 


cam/ctl/ctl backend.c optional ctl 


Gam/Ctl/ ctl backend: block .c.optional ctl 


cam/ctl/ctl backend ramdisk.c optional ctl 
cam/ctl/ctl_ cmd table.c optional ctl 
Gam/cti/erl frontend. €.optional. ctl 
Gam/ctl/ctl frontend. cam sim.c-optional ctl 


cam/ctl/ctl frontend internal.c optional ctl 


Cam/cCtly ctl. frontend 1Ssesi-e optional ctl 
Gam/CLilL/Cul. Sesi all se. ope ronal «eel 

Gamysctl/cul tpc.c optional :ctk 

Gam/ Cti/crl tpc: localee optional, etl 
cam/Ctl/etl-error.c optional ctl] 
Camel, Gel. ad lac Operonel “erl 

Gam/Ctl/ses.. -Ctl.c Optional, ctl 
cam/scsi/scsi_da.c optional da 
Cam/SCSiL/SCSL lows Optional, ct. || nev (| nsp-|. st¢ 
cam/scsi/scsi_ pass.c optional pass 


Cams SCS1/SCS1.pi,C “Optional: pt 


Gam/sSesi/scsi.Sa.c optional sa 


Each line has the following: the relative path to sys and the type of module. If type is optional; the 
module will be compiled with the (lower case) option name written afterwards. 


Again, with our new module, we can add our specific kernel module C file. For example, for file 
workshop modulel.c: 


workshop bsdmagmodulel.c optional bsdmag 
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3/ Build 


So now we can create a custom kernel config. Let’s call it WORKSHOP. 
cp /usr/src/sys/<arch>/conf/GENERIC /usr/src/sys/<arch>/conf/WORKSHOP 


echo “KERNCONF=WORKSHOP” >> /etc/make.conf (it will pick up the new WORKSHOP 
configuration file, by default it is the GENERIC one) 


Steps to build a system: 


First, the userland needs to be compiled. 


Go to /usr/src 


If your machine has multiple cores, it is advised to use them for the system compilation. 


This might take several hours depending on your current configuration. 


Build the userland: 


make -j<number of corest+l> buildworld 


Build the kernel: 


make -j<number of corest+l> buildkernel 


Install the kernel: 


make installkernel (install the kernel in /) 


It is possible to do the following to build userland and the kernel in the same command: 


make -j<number of corest+l> buildworld kernel 
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Restart in single user mode (via command line shutdown -r or via the boot menu) 
goto /usr/sre 


Then run merge master to merge configuration and other files from the current system with the new 
ones that were built. 


mergemaster -p 
Then, install: 

make installworld 
then: 

mergemaster -FU1 


Mergemaster will try to merge various configurations files and asking you how you wish to proceed, 
merging as possible, replacing with a newer one or keeping the existing file. Warning, doing so the 
system will attempt to merge config files in various places, especially regarding potential ssh, 
user/groups related, a bit of caution not deleting additional settings/users/groups in the process, merge 
master will always ask you how do you plan to merge via your preferred editor. 


Restart in normal mode. 


You should have now a workable system with the latest fixes/patches for the 10.x branch. But as a 
developer, we might need more info from the system for debugging, studying the core dump after a 
system crash/kernel panic. It is advisable to enable kernel core dump writing (could be enabled when 
you installed FreeBSD or, afterwards, can be enabled via the dumpdir rc.conf variable) at the cost of 
disk space consumption (can be potentially important, deleting old ones is necessary.). They are, by 
default, located in /var/crash. To debug a kernel crash dump, the kernel compiled with debugging 
symbols, kernel.debug, is necessary. gdb can be used in the following way. 


kgdb 

GNU gdb 6.1.1 [FreeBSD] 

Copyright 2004 Free Software Foundation, Inc. 

GDB is free software, covered by the GNU General Public License, and you are 
welcome to change it and/or distribute copies of it under certain conditions. 
Type "show copying" to see the conditions. 

There is absolutely no warranty for GDB. Type "show warranty" for details. 


This GDB was configured as "amd64-marcel-freebsd"... 
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Reading symbols from /boot/kernel/ng ubt.ko.symbols...done. 
Loaded symbols for /boot/kernel/ng_ ubt.ko.symbols 

Reading symbols from /boot/kernel/netgraph.ko.symbols...done. 
Loaded symbols for /boot/kernel/netgraph.ko.symbols 

Reading symbols from /boot/kernel/ng hci.ko.symbols...done. 
Loaded symbols for /boot/kernel/ng hci.ko.symbols 

Reading symbols from /boot/kernel/ng bluetooth.ko.symbols...done. 
Loaded symbols for /boot/kernel/ng bluetooth.ko.symbols 

Reading symbols from /boot/kernel/ng 12cap.ko.symbols...done. 
Loaded symbols for /boot/kernel/ng 12cap.ko.symbols 

Reading symbols from /boot/kernel/ng btsocket.ko.symbols...done. 
Loaded symbols for /boot/kernel/ng btsocket.ko.symbols 

Reading symbols from /boot/kernel/ng socket.ko.symbols...done. 
Loaded symbols for /boot/kernel/ng socket.ko.symbols 

Reading symbols from /boot/kernel/dtraceall.ko.symbols...done. 
Loaded symbols for /boot/kernel/dtraceall.ko.symbols 

Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. 
Loaded symbols for /boot/kernel/opensolaris.ko.symbols 

Reading symbols from /boot/kernel/dtrace.ko.symbols...done. 
Loaded symbols for /boot/kernel/dtrace.ko.symbols 

Reading symbols from /boot/kernel/dtmalloc.ko.symbols...done. 
Loaded symbols for /boot/kernel/dtmalloc.ko.symbols 

Reading symbols from /boot/kernel/dtnfscl.ko.symbols...done. 
Loaded symbols for /boot/kernel/dtnfscl.ko.symbols 

Reading symbols from /boot/kernel/fbt.ko.symbols...done. 

Loaded symbols for /boot/kernel/fbt.ko.symbols 


Reading symbols from /boot/kernel/fasttrap.ko.symbols...done. 
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Loaded symbols for /boot/kernel/fasttrap.ko.symbols 

Reading symbols from /boot/kernel/lockstat.ko.symbols...done. 
Loaded symbols for /boot/kernel/lockstat.ko.symbols 

Reading symbols from /boot/kernel/sdt.ko.symbols...done. 
Loaded symbols for /boot/kernel/sdt.ko.symbols 

Reading symbols from /boot/kernel/systrace.ko.symbols...done. 
Loaded symbols for /boot/kernel/systrace.ko.symbols 

Reading symbols from /boot/kernel/systrace freebsd32.ko.symbols...done. 
Loaded symbols for /boot/kernel/systrace freebsd32.ko.symbols 
Reading symbols from /boot/kernel/profile.ko.symbols...done. 
Loaded symbols for /boot/kernel/profile.ko.symbols 


#0 sched switch (td=0xfffFf8011b80a940, newtd=<value optimized out>, 
flags=-2123250552) at /usr/src/sys/kern/sched ule.c:1940 


L940 epuld- =" PCPU GET (epuld) 


Like the userland gdb’s counterpart, we can use backtrace (bt). 


(kgdb) backtrace 


#0 sched switch (td=0xfffFf8011b80a940, newtd=<value optimized out>, 
flags=-2123250552) at /usr/src/sys/kern/sched ule.c:1940 


#l ORELELTTEPLESO9Sb13S9 in mi switch (flags=Unhandled-dwart expression. opcode 
Ox 


) at /usr/srce/sys/kern/kern synch.c:492 


#2 OZTRETERECSCOCSb1L 72 an sleepq switch: (wehen=<value’ optimized ours, 
pri=<value optimized out>) at /usr/src/sys/kern/subr sleepqueue.c:552 


#o. ORELELEEECSO99atd3 an sleepq wait. (wchan=UxEttrrsolis3sie200, pri=Unhandled 
dwarf expression opcode 0x93 


) at /usr/src/sys/kern/subr sleepqueue.c:631 
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#4 Oxffffffff8095aa47 in sleep (ident=0x0, lock=0xfffff80115316230, 
priority=0, wmesg=Oxffffffffsoff47f2 "-", sbt=0, pr=0, flags=<value optimized 
out>) at /usr/srce/sys/kern/kern synch.c:254 


#O OXLELLCLLIELTSOSIT IIS an taskqueue thread loop: (arg=<value:. optimized out>) at 
/usr/src/sys/kern/subr taskqueue.c:118 


7#O) ORETIEL TEE CUGLSE 234 an Tork exit (cal lout=Oxtriitrrreos9fobu 
<taskqueue thread loop>, arg=Oxfftfirso00/Sebe90, frame=Oxfttrtred232e9fac0) at 
/usr/src/sys/kern/kern fork.c:996 


#7 Oxfffffftffso0d4f4fe in fork trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:610 


#8 OxO0000000000000000 in ?? () 


Also, list if we want to see 


(kgdb) list *Oxffffffff8095aa47 (coming from the frame number 4 from above) 
Oxffffffffs095aa47 is in sleep (/usr/src/sys/kern/kern synch.c:254). 

249 else if (sbt != Q) 

290 rval = -sleepq Timedwait (ident, priy¢ 

251 else if (catch) 

252 ival = Ssléespg wait. sig (1dent;, pri) 

253 else { 

254 sleepgq wait(ident, pri); 


255 seal’ = O02 


257 #ifdef KTRACE 

258 1f (KTRPOINT (td, KIR_CSw)) 

Then, for example, going up in the stack frames calls and so on ... 
(kgdb) up 2 


74: OREEETE ETP S0OSaa os in. sleep: (rdent=0x0, Lock=CUxtrrrreuugsaceasl, 
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priority=0, wmesg=OxffffFfffrfsofrfr47£2 “"-", sbt=0, pr=0, 
flags=<value optimized out>) at /usr/src/sys/kern/kern synch.c:254 
254 sleepgq wait(ident, pri); 

(kodb) Jaetr 

249 else if (sbt != 0) 

250 rval = sleepgq timedwait (ident, pri); 

251 else if (catch) 

202 tVal = sleepg wait. sig (ident, pri)? 

253 else { 

204 sleepq wait (ident; pri); 

255 rval = 0; 

256 } 

257 #ifdef KTRACE 


299 if (KTRPOINT (td, KIR CSW) ) 


For more information about gdb, a good exists at: 


http://obsdmag.org/course/application-debugging-and-troubleshooting-2 


If you run the -CURRENT branch, the kernel can crash for various reasons, and the gdb-like tool is 
handy to get a basic understanding of the reasons for the crash. 


Detecting potential deadlocks. 


FreeBSD does not rely on the Giant Lock model anymore. Instead, it has fine-grained locking/unlocking 
process. Hence, the resulting programming can be tricky and it is easy to get lock contentions. As a 
kernel developer, you can always enable the WITNESS* kernel options for detecting contentions and 
locks circular references; but beware that the system becomes pretty slow without 
WITNESS_SKIPSPIN (skip spin locks basically) is not activated. 


Having read this workshop module, you learned the basics of kernel custom configuration, compiling 
the whole system. 
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Exercise 


To enable those additional capabilities, deadlock detections, more debugging info, and additional 
checking, which options in conf/options are needed? 


- Recompile the kernel with those options. 


- Once the system is restarted, which differences are you noticing? Eventually, what are the 
downsides? 


- Is it possible to improve the situation? How? 
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Module 2: IPFW2 
Userland and Kernel 
Workflow 


In this module, we'll have an overview of ipfw2 - both userland and kernel side - and how they both 
interact. 


1/ IPFW command line settings via sysctl 


We can find it under the FreeBSD's source code we got from svn in the first module. 


<source code rooth path>/sbin/ipfw/ 


In the previous module, we saw the base of a kernel module. IPFW works differently. It enables/disables 
features via sysctl. Those who have done some FreeBSD programming might have used it, so syscalls 
like sysctl / sysctIlbyname / sysctInametomib are already familiar, and you can jump directly to the next 
chapter. 


Otherwise, IPFW uses sysctlbyname and sysctl. Here are their function signatures: 


int sysctlbyname (const. char “name, void *oldp, size t *oldilenp;, const. void 


*newp, size t newlen); 


Imt: Sysctl (const i2nt “name, U Vint - leamelen, void. “oldp,. size ct olalenp, const 


void *newp, size t newlen); 


If you wish to get a value, the oldp and oldlenp arguments need to be used. To set a value, use the 
newp and newlen arguments. 
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For example, to get the number of CPUs available: 
int. nbepu; 


size. t. mMbCpulen. = *Ssi760F (nbepu) 


if (sysctlbyname (“hw.ncpu”, &nbcpu, &nbcpulen, NULL, 0) == 0) 


Printi(™ sd cpus\n";- mbcpu)s 


Alternatively: 


re. Macy |Z | 


mib[0] = CTL_HW; 


mib[1] = HW NCPU; 


if (sysctl(mib, sizeof(mib), &nbcpu, &nbcpulen, NULL, 0) == 0) 


Or the opposite, setting a value, like the number of maximum file descriptors: 


int maxfiles = 4096; 


Sa7eG maxfileslen = sizeof (maxfiles); 


if (sysctlbyname (“kern.maxfiles”, NULL, O, &maxfiles, maxfileslen) == 0)... 


mib[0] = CTL KERN; 


mib[1] = KERN MAXFILES; 


if (sysctl(mib, sizeof(mib), NULL, O, &maxfiles, maxfileslen) == 0) 
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IPFW uses the sysctl* family function to turn on and off the firewall itself or to make ipfw more verbose. 


Here is the code where the firewall is enabled/disabled: 


/* EXPLANATION: av here are the parameter passed by command line */ 
p else 2 “( substrenp (*av>, “tirewally) == 0) 4 
sysctlbyname ("net.inet.ip.fw.enable", NULL, 0, 
&éwhich, sizeof (which) ) ; 
sysctlbyname ("net.inet6.ip6.fw.enable", NULL, 0, 


&éwhich, sizeof (which) ) ; 


2/ IPFW command line settings via socket 


In addition, ipfw uses a socket to, for example, add a rule via an identified optname. 


Here is a sample of the code responsible for getting settings from the kernel: 


Stab Cant 

table do modify record(int cmd; ipiw ob] header *oh, 
ipfw-ob] -tentry *tent;, int count; <1ne atomic) 

{ 

IpEwY ob] Crly Actla; 

ipiw obj. tentry “tent base; 

caddr. iG ppuns 


Char xbuf |sizeort(*on) + sizeot{ipiw ob] cily) + sizeot (tent) |; 
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iit error: 7 


Size b. S27 


error = do _get3(cmd, &0h->opheader, &sz) ; 


ac rits 
do. get3(int optname, ip fw3 opheader *op3, size t *optlen) 
{ 


int error; 


if. (COvEese Only) 


return: (Os 


inn. Clr Wwe SOCKSE. ==--ab) 


ipfw_ socket = socket (AF_INET, SOCK RAW, IPPROTO RAW); 


it CLpiw- SOCKS. <0) 


err (EX UNAVAILABLE, "socket"); 


op3->opcode = optname; 


error = getsockopt (ipfw_socket, IPPROTO IP, IP_FW3, op3, 


(socklen_t *)optlen) ; 


return (error); 
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An ipfw3_opheader structure needs to be passed, here is its raw definition ... 


typedef struct ipfw3 opheader { 
uint1l6 t opcode; (Operation identifier) 
uintl6 ©. version; 


padding 


lpfw3 command list sample (sys/netinet/ip_fw.h): 


#define IP_FW TABLE XADD 86 /* add entry */ 
#define IP FW TABLE XDEL 87 /* delete entry */ 
#define IP FW TABLE XGETSIZE 88 /* get table size 


(deprecated) */ 


#define IP FW TABLE XLIST 89 (* lest. table contents 7 
#define IP FW TABLE XDESTROY 90 /* destroy table */ 

#define IP FW TABLES XLIST oe [BLS rei Peabes: “*/ 
#define IP FW DUMP SOPTCODES 116 /* Dump available 


sopts/versions */ 


And here is the list of available opcodes (aka ipfw rules representation): 


enum ipfw_ opcodes { 


O NOP, 
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O-IPVUSRE, /* u32 = IP 


ey 
O IP SRC_MASK, /* ip = IP/mask *y 
OTP “SRC ME; /* none 

“y 
OT PSR SET, /* u32=base, argl=len, bitmap */ 
O: TP DST, [*® U32) =" 1P 
ay 
O IP DST MASK, /* ip = IP/mask x) 
QO IP DST ME, /* none 

my, 
Or TP DST SET, /* u32=base, argl=len, bitmap */ 
OTLB SRCPORT, /* (n)port list:mask 4 byte ea wy 
Or LE ADS LPORT; /* (n)port list:mask 4 byte ea iS 
O<PROIO; /* argl=protocol bY f 
O MACADDR2, /* 2 mac addr:mask ey 
O MAC TYPE, /* same as srcport Af. 


3/ IPFW command from userland to the kernel 


Now, let's study, programmatically speaking, an ipfw command and its way to the kernel. 


/* EXPLANATIONS: Here are the table related command, we will be focusing 
on. printing them. */ 


> Sp EW: Gable: ot «das. 


if bCon Wee: Ser - [Pill Cry ner )i~ 1 
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Tf Subs cEemp trav. “deleve”)) ==. "0) 
ipfw_ delete (av); 

elserin. (subst rcnp( av,“ Llbuste): == 0) 
ipfw flush (co.do force) 7 

else ub “( Substreme (tay pc Zero })) = 00) 
iptw Zévo lac; av; 0° 7* AP FW ZERO. -*/).4 


sleet. Ssubstrcmp( “av, reset Log™). = =- 0) 


ipfw zero(ac, av, 1 /* IP FW RESETLOG */); 
/* Here print tables and its alias */ 
else if (_substremp(*av, "print") == | | 

_substremp(*av, "list") == 0) 
ipfw_list(ac, av, do_acct); 

elseif. (ssubstrenp (wav, “show™). == 0) 

ipiw List(ac, av, {14/* show counters. */).% 

cise ub i sSubstrenip avy, “cable” ) == 00) 

ipfw table handler(ac, av); 

else ar ( substremp(*avy “internal” )) ==-.0) 

ipfw_internal handler(ac, av); 

else 

errx (EX USAGE, “bad Command: “ss? "). *aw hy 


} 


void 


LOfw. ast Gin ac, shar ay ||, Han. show counters) 
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iptw cig lheader “cfg; 
SEFUCE -TORMal ODES S107 


SiGe bis 2; 


/* get configuration from kernel */ 
ore =: NUL 
sfo.show counters. = show counters; 


sto.show time = co,do. time; 


sfo.flags = IPFW CFG GET STATIC; 


if (cowdo: dynamic: t= 0) 
Ssto.t lags: j= "LPEW Cre GCE SLATES; 
if C(sto.show counters: || ‘sio.show time): ~!=-.0) 
sto ctiags. |= LPEWO CEG GET -COUNTERS; 
if (ipfw_get_config(&co, &sfo, &cfg, &sz) != 0) 


err (EX OSERR, "retrieving config failed") ; 


Stab c anc 

ipiw -get contig (struct. cmdline opts *co, Struct. formacopts “fo, 
Lpiw. cig: Lneader -**pcrg,. Size ck ““psize) 

{ 

IpEW CLG. - header *Grg 

SIZE Sz7 


hie, ane 
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ify (COaetese Only 2= 50). 4 
forintfi (stderr, “Testing only, list disabled\n") ; 


return’ :(O)> 


{/* Start with some data size */ 
sz = 4096; 


cfg = NULL; 


for “(i = 0; 2 -< 167 att) 4 
if (cfg != NULL) 


free(cfg); 


if ((cfg = calloc(l, sz)) == NULL) 


return (ENOMEM) ; 


eig=>ilags. = fo--ilLags: 
Chg =2etart ruke = fons tiret; 
cig=>end rule = fo=>last; 


/* This is where the command is going to be send to the kernel with a raw 
memory estimate and try again by growing it if the data requires more room*/ 


if (do _get3(IP_FW_XGET, &cfg->opheader, &sz) != 0) { 
if (errno != ENOMEM) { 
free(cfg); 


return (errno); 
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/* Buffer size is not enough. Try to increase */ 
SZ = sz.* 2; 

Lt (Sz. < ei gq--Size) 

SZ = cig--size; 


continue; 


*pcig = cig; 
*psize = SZ; 


return (0)> 


free(cfg); 


return (ENOMEM) ; 


Here, the userland part ends and we're going to see what happens in the kernel: 


Stale amet 

dump contig (struct. ip fw chain “chain, ip fw3 opheader *op3, 
Struct: Sseckope data -*sd) 

{ 

iptw crg. [header *hdr; 

struct ip ftw *rule; 

Size tb sz, rhum; 


UintS2 tt) hdr flags; 
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Te: Ie FON» a7 
SLIUCE-dump~ args da; 


uint32 t *bmask; 


/* Our data lies contigously in raw form into the ipfw_ cfg header struct 


Below, we 1l get the data we re interested in calculating the needed 
memory depending of the flags we passed earlier 


a7 
hdr = (ipfw_cfg lheader *)ipfw_get_sopt_header(sd, sizeof (*hdr) ) ; 


// Depending on the flags you passed from the command line, various 
data are going to be displayed 


if (hdr->flags & IPFW_CFG GET STATIC) { 

for (12-=-da.b?7 2 <™ da.e8 a+4+) 4 

rule = chain->map[i]; 

da.rsize- +=) RULEUSIAZBRL (rule) -+ sigeot(ipiw- ob] tly); 
da cEcount++; 


da.tcount += ipfw mark. table kidx(chain, rule, bmask) ; 


/* Add counters if requested */ 
if (hdr->flags & IPFW_CFG GET COUNTERS) { 
dav PS 76 “b= Ssizeor (Struct tp tw beCounter) >< -dacrcount ; 


da.rcounters = 1; 


Le ACavveCOunt. > 0) 
SZ, P= das ECOUN ce: *< Sa reOr (1 piw Ob) me ley): sr 


sizeof (ipfiw obj ctlv)-; 
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S27. = Cas hsiZe. +r Si zeor (tpiw obi Chly)y 


j 


if (hdr->flags & IPFW_CFG GET STATES) 


S27 2piw dyn. ger count () “si Ze0t (i piw ob] dyntly) «+ 


sizeof (ipiw obj .ctiv); 


Stakre “1iiG 

dump Static rules (struct ap twoehain. “chain, Struct dump args *day 
Uint3s2 ty *bmask,. Struct Sockopl data *sd) 

{ 

int. error; 

atts. al ot aes 

WINES 2: “beounts 

LDL WwW Obi SCL: Merit, 

SEEUGE ap. tw  *krule; 


caddr t. dst; 


tcount = da->tcount; 

while (tcount > 0) { 

af. C“(bmask i 7 S2) se lS vas) 32) yy) SS 0)) £ 
BO aaa 


continue; 
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if ((error = ipfw_objhash_ ntlv(chain, i, sd)) != 0) 
return (error) ; 
aes 


tcount--; 


ni gla 

ipfiw-expore table ntlv(struct ip fw chain: *ch, “uintle t Kidx, 
Struct. Sockopt data *sd) 

{ 

struct namedobj instance *ni; 

SEEFUCL- Named obj] 6Gr “no; 


ipiw-Oby wiely “nt lv 


na. = CHAIN TO UND Ceh).¢ 


no = ipfw objhash lookup kidx(ni, kidx); 


KASSERT (no != NULL, ("invalid table kidx passed") ); 
htlw = \(ipiw obj ntly “)ipiw get Ssopt. space(sd,: sizeor(*ntly)) 7 
if (ntlv == NULL) 


return (ENOMEM) ; 


ntlv->head.type = IPFW TLV TBL NAME; 


ntlv->head.length = sizeof (*ntlv); 
ntlv->idx = no->kidx; 


strlcpy(ntlv->name, no->name, sizeof (ntlv->name) ); 


return: (0.)2 
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Now we have a better idea of how structured ipfw works. Basically, sysctl for boolean config values and 
via the socket for the firewall rules, tables settings and so on. Normally, some ideas might start to 
emerge for the next module. 


In the next and last module, we will have an overview of how it works on the kernel side with the firewall 
rules and configuration to see how to develop a new type of rules. 
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Exercises 


* To have an additional sysctl configuration point entry, explain which part of the kernel needs to be 
updated and how (what are the requirements’). 


- We saw the list of available opcodes to configure or to get information from the kernel. If it is possible 
to add one, what are the requirements for the enum ipfw_opcodes values? What are the requirements 
for struct ipfw_insn (and derived) structs in term of alignment? 


* Considering a new feature to add to ipfw and in case a third party code is used, how should the work 
be shared between the userland and kernel side? 
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Module 3: Through 
The Userland to 
Kernel Codes 


In this module, we'll have an overview of ipfw2 - both userland and kernel side -, and how they interact. 


First of all, we will see how to use sysctl we saw in previous modules to set simple values. How to 
communicate settings to the kernel via a socket; all of it going through the userland to kernel codes. 


1/ IPFW command line settings via sysctl 


We can find it under the FreeBSD's source code we got from svn in the first module. 
<source code root path>/sbin/ipfw 


In the previous module, we saw that it was possible to enable the userland to interact with the kernel 
via a character-device. IPFW works differently. It enables / disables features via sysctl. If you have done 
some FreeBSD's programming and are already familiar with syscalls like sysctl, sysctlbyname, and 
sysctInametomib, you can jump directly to the next chapter. 


IPFW uses sysctlbyname and sysctl. There signatures are: 


int sysctlbyname (const char *name;,; void *oldp, size t *oldlenp, const void 
*newp, size t newlen); 


int SyScrl (Const int name, U Cink Namelent,..void “oldp;:. size © oldlenp;- const 
void *newp, size t newlen); 


If you wish to get a value, the oldp and oldlenp arguments need to be used. To set a value, use the 
newp and newlen arguments. 
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For example, to get the number of CPUs available: 


Lie nbe ous 


Size t- nbcpulen. = Si zsorf (mbepu) + 


if (sysctlbyname (“hw.ncpu”, &nbcpu, &nbcpulen, NULL, 0) == 0) 


printii~sd. cpus \n"’,.nbcpu); 


Alternatively: 


Tit mip ie 


mib[0] = CTL HW; 


mib[1] = HW _NCPU; 


if (sysctl(mib, sizeof(mib), &nbcpu, &nbcpulen, NULL, 0) == 0) 


To set a value, like the number of maximum file descriptors: 


int maxfiles = 4096; 


Size -t maxfileslen = sizeof (maxfiles); 


if (sysctlbyname (“kern.maxfiles”, NULL, O, &maxfiles, maxfileslen) == 0)... 


mib[0] = CTL KERN; 


mib[1] = KERN MAXFILES; 
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if (sysctl(mib, sizeof(mib), NULL, O, &maxfiles, maxfileslen) == 0)... 


IPFW uses the sysctl* family of functions to turn the firewall on/off and to make ipfw more verbose. 


Here is the part of the code part where the firewall is enabled or disabled in sbin/ipfw/ipfw2.c. This is 
called by the code in the main() function that parses user-supplied arguments. 


Cisse: 2t- (Subst ecmp (sav, “ri vewa Lin) a=" 0):..4 
sysctlbyname ("net.inet.ip.fw.enable", NULL, 0, 
&éwhich, sizeof (which) ) ; 
sysctlbyname ("net.inet6.ip6.fw.enable", NULL, 0, 


&éwhich, sizeof (which) ) ; 


For more information, especially about all possible requests, check the sysctl man page: 


man 3 sysctl 


2/ IPFW command line settings via socket 


Here, we need to communicate our settings to the kernel side. 
ipfw uses a socket to add a rule via an identified optname/command. 


Here is a sample of code responsible for getting settings from the kernel from sbin/ipfw/tables.c: 


See b Ve. Une 
table do modify record(int cmd; ipfw obj] header *oh, 
Lpiw Ob) -cTencry:* GEme;: nt COumE, Jae. atomic) 


ipiw .ob] ctly *ctlv; 
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iLpiw Ob), tentry -*tentk base; 

caddr t pbhut; 

Char bur (Srzeot (oh): a sei 2eorUpry Ob) “cCElvj + Si 2e0r (stent) |7 
Tie erro se 


SA7e te S27 


error = do_get3(cmd, &oh->opheader, &sz) ; 


Lime 
do gets(int -optname; ip wo opheader “ops, -saze tt. “optien) 
{ 


int error; 


iE ACOSESS TE. Only) 


return (0): 


Zk Ad pEw-isOCkel, == -=2)) 


/* even though we could have used AF LOCAL here, we need to distinguish IPV4 
from IPV6 matters */ 


ipfw socket = socket (AF_INET, SOCK RAW, IPPROTO RAW); 


/* communication with ipfw2 command line here (via do cmd), the “get config” 


command will be coming from there */ 
ick «LEW SOC Ker <1) 


err (EX UNAVAILABLE, "socket") ; 
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/* here, we send the equivalent programmatically speaking, of the command ; 
opcode which is the hexadecimal representation of */ 


op3->opcode = optname; 


error = getsockopt (ipfw_socket, IPPROTO IP, IP_FW3, op3, 


(socklen_t *)optlen) ; 


return (error); 


AM 1piwo3s pheader structure: needs.to.be passed. .Mere 23 10s raw-derinition? 


typedef struct ipfw3 opheader { 
uintl6 t opcode; (Operation identifier) 
Mint love version; 


padding 


Some representative Ipfw3 commands are shown below. These are from 
sys/netinet/ip fw.h. 


#define IP FW TABLE XADD 86 /* add entry */ 
#define IP FW TABLE XDEL Si. .§* -Gelete. entry 4/7 


#define IP FW TABLE XGETSIZE 88 /* get table size (deprecated) */ 


#define IP FW TABLE XLIST89 /* list table contents */ 


#define IP FW TABLE XDESTROY 90 /* destroy table */ 


#define IP FW TABLES XLIST 02° -f*® Tse all tables: */ 
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#define IP FW DUMP SOPTCODES 116 /* Dump available sopts/versions */ 


And here is the list of available opcodes, also from sys/netinet/ip_fw.h: 


enum ipfw opcodes { 


O NOP, 

O IP SRC, fe 582: STP x / 

O IP SRC MASK, /* ip = IP/mask * / 

O°) LP SRC ME, /* none if 

O.IP SRC SET, /* u32=base, argl=len, bitmap */ 

O IP DST, /* 032-5 IP * / 

O IP DST MASK, /* ip = IP/mask od 

OTP DST Miy /* none ay 

OTP DSr Sky /* u32=base, argl=len, bitmap */ 

O “IP -SRCPORT, /* (n)port list:mask 4 byte ea af 
O27 LP DSTPORT; /* (n)port list:mask 4 byte ea ag 
QO: PROTO, /* argl=protocol ca A 

O MACADDR2, /* 2 mac addr:mask ui 

O MAC TYPE, /* same as srcport AY, 
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3/ IPFW command from userland to the kernel 


Now, let's study how an ipfw command makes its way to the kernel from sbin/ipfw/ipfw2.c: 


So EW Gabe: sacl Tas. 


LE (CO ease .ser. |i) bey next): 4 
if il SubsStrcmp (aw, ' “deleve™). ==) 
ipfw_delete (av) ; 
alee 1 ( substremp.™ayv,- "it lush') == 0) 
Lpiw tilush(co.do Loree) 7 
else it. ( Substrenp avy, “zero” jy ==.) 
Lprw Zerolac, ay, Oc7* PEW ARRO- 7 ))7 


else rt «(  subseremp (Fav, “reset log™) == 70) 


ipfw zero(ac, av, 1 /* IP FW RESETLOG */); 


/* Here is an example we can go through the code flow from the table 
print/list command */ 


else if (_substrcemp(*av, "print") == | | 
_substremp(*av, "list") == 0) 
ipfw_list(ac, av, do_acct); 

else pt. ( substremp(*av, “show") == 0) 
ipfw list(ac, av, 1 /* show counters */); 

else at. (-Supstrenp (ay, "tablten). ==".0) 
ipfiw table handler(ac, av); 

else 21 \( Subs lrcnp(*ay,-"anteria le )) SS) 
ipfiw internal handler(ac, av); 

else 


errx (EX USAGE, "bad command “%s'", *av); 
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void 
ipiw: list (ant ac, char *av[]y int. show: counters) 
{ 

ipiw cfg. bheader “cig; 

SLLUCL LOrmal. Opts. sto; 


Size ok SZy 


/* get configuration from kernel */ 
cfg. = NULLS 
sfo.show_counters = show_counters; 


sfo.show time = co.do time; 


sfo.ilags = LPFW:- CFG GET STATIC; 


if (condo dynamic t= 0) 
sfo.flags |= IPFW CFG GET STATES; 

it (SLO. show counters: ‘| Stoushow time). =O) 
SrOc Flags: |||= -LPrw :CrG. GET COUNTERS; 


/* We get the general config from here */ 
if (ipfw_get_config(&co, &sfo, &cfg, &sz) != 0) 


err (EX OSERR, "retrieving config failed") ; 


Static: ant 


ipfw get. config (struct. cmdline opts. *co, struct..format- opts. *Lo, 
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ipiweicrg: Micader*pcrg,. size *psizZe) 


Iipiw-cig: lheader *cig, 


SuZS Me. Sze 


TT, “15 


Lr (eo= test only “l= 0) 4 


fprinté£ (stderr, “Testing .only,; list. disabled\n"); 


return (0); 


/* Start with some data size */ 
sz = 4096; 


cfg = NULL; 


Por mL, Se: si Ke Loe See) 
if (cfg != NULL) 
free(cfg); 
if ((cfg = calloc(l, sz)) == NULL) 


return (ENOMEM) ; 


Cig-Silags = Ttos>ilags; 
Clo- eS lark 2S = Lopresti: 


Cig=rend tile. =i 6-> last; 


if (do _get3(IP_FW_XGET, &cfg->opheader, &sz) 
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'= 0) 


if (errno != ENOMEM) { 
free(cfg); 


return (errno); 


/* Buffer size is not enough. Try to increase */ 
Sui. = Sz: rs 
LE (S204 -Cig=>size) 

sz = cfg->size; 


continue; 


*ocfg = cfg; 
*psize = SZ; 


TERN w0);¢ 


free(cfg); 


return (ENOMEM) ; 


Here, the userland part ends and we're going to see what happens in the kernel: 


static int 


dump config(struct ip fw chain *chain, ip fw3 opheader *op3, 


SLFUCE .SOCKOpE data. *sd) 
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ipfiw cfg lheader *hdr; 
SEEUCE ip tw *rule; 
Size b S75. pnums 
uint32 t hdr flags; 
Iti. “SEO : 4:4 

struct ‘dump. args: da; 


ULNESZ, “oA omask; 


hdr = (ipfw_cfg lheader *)ipfw_get_sopt_header(sd, sizeof (*hdr) ) ; 


// Depending on the flags you passed from the command line, various data 
are going to be displayed 


if (hdr->flags & IPFW_CFG GET STATIC) { 
LO tilab, = den; ae ae ey aa) 4 
rule = chain->map[i]; 
dacrsize += RUGEUSTARL (rule): pp sizeor (Apiwe Ob] ti) 3 
dai... COUR rap 
da.tcount += ipfw_mark table kidx(chain, rule, bmask); 
} 
/* Add counters if requested */ 
if (hdr->flags & IPFW_CFG GET COUNTERS) { 
dasrSize += -SiZeOE(SErUCE ap. fw Dbcounter) * da. reount; 


da.rcounters = 1; 


i” (dev seeounc? > (O) 


S2-t=- da. Count > Srzeor (ipiw Obi. nily): + 
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sizeof (ipfw obj ctlv); 


SZ, 4+=.da. resize. (sizeot (iptw ob ucilv) 7 


if (hdr->flags & IPFW_CFG GET STATES) 


SZ t= 1 piw Oym ger scoune()..* si 2s0t(ipiw ob7 <dynvily). 


sizeof (ipiw-obj ctlv); 


Stab eu anc 
dump static rules(sStrucy ip iw chain, *chain, <suruck dump args. *da; 


UInNkS2 de. “bask, SErUCEVsOCkope cela * sd) 


Lit (error; 

Lie, ag. es 

UNE 2, CC ECouNC: 
LpLwroby etl *et ly; 
structip fw *krule; 


Caddr tc dst; 


tcount = da->tcount; 
while (tcount > 0) { 
if ((omask[i / 32] & (1 << (i % 32))) == 0) { 
eas 


continue; 
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if ((error = ipfw_export table ntlv(chain, i, sd)) != 0) 


return (error) ; 


ik; 


tcount--; 


ahigues 
ipfw export table ntlv(struct ip fw chain *ch, uintlo t kidx, 


SErUCE “SOGKODE data- sd) 


struct,namedobj instance. *ni; 


SCrUCL mMamed rob JECU: *no; 


LpLw Oba nti: >nidyy 


Nas = "CHATN. SON IVCCh); 


no ipfw_objhash lookup kidx(ni, kidx); 


KASSERT (no != NULL, ("invalid table kidx passed")); 


nely = (ipiweoby Mile; +) piw- cst Sope.-<space (sd, “sigeok (*ntly))% 


if (ntlv == NULL) 


return (ENOMEM) ; 


ntlv->head.type = IPFW TLV TBL NAME; 
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ntlv->head.length = sizeof (*ntlv); 
ntlv->idx = no->kidx; 


strlcpy(ntlv->name, no->name, sizeof (ntlv->name) ); 


return. (0) 3 


Now, we have a better idea of how ipfw works. Basically, sysctls are used for boolean config values 
and sockets for firewall rules, tables settings and so on. Normally, some idea might start to emerge in 
the next module. 


In the next and final module, we will have an overview of how it works in the kernel side, the firewall 
rules and configuration to see how to develop a new type of rules. 
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Exercises 


- To have an additional sysctl configuration point entry, explain which part of the kernel needs to be 
updated and how (what are the requirements)? 


- We saw the list of available opcodes to configure or to get information from the kernel. If it is possible 
to add one, what are the requirements for the enum ipfw_opcodes values? What are the requirements 
for struct ipfw_insn (and derived) structs in terms of alignment? 


- Considering a new feature to add to ipfw and in case a third party code is used, how is the work 
ought to be shared between the userland and kernel side? 
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Module 4: 
DUMMYNET Module 
Workflow Study 


In this last module, we'll not only look at ipfw’s communication with the kernel but also how the firewall 
configuration and rules are handled. 


We will go through the dummynet module, its workflow and how it operates with the kernel so you 
would be able to add new opcodes on your own. 


1/ DUMMYNET module study 


The dummynet (unlike the name suggests, it is not a kind of fake/no-op module, but the name is due to 
historical reasons as it was a test ensemble in the beginning) module allows setting network bandwidth 
limits (called traffic shaping), and is an optional submodule of ipfw. Since it is optional, it has to be 
enabled via the kernel configuration options DUMMYNET (into sys/conf/options). Beware there is the 
compat(iblity) layer for 32 bits (if needed), cloudabi (the secure posix interface layer) and probably for 
the Linux API compatibility layer that you might need to take care of when you develop a kernel 
module. 


Programmatically, there are four flags available to add a pipe (a pipe is to viewed as a workflow queue 
where every packet belonging to this same queue will be treated by the scheduler. (For more details 
into netpfil/ipfw/dummynext.txt), deleting a pipe, flushing and getting the pipe info. 


#define IP DUMMYNET CONFIGURE 60 
#define IP DUMMYNET DEL 61 


#define IP DUMMYNET FLUSH 62 
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#define IP DUMMYNET GET 64 


=> The ipfw module ought to be loaded after ipfw. The DN _MODEV_ORD in 
sys/netpfil/ipfw/ip dummynet.c ensures this: 


#define DN SI SUB SI. SUB PROTO IFATTACHDOMAIN 


/* dummynet is guaranteed to start after ipfw, given that ipfw init phase 
occurs at (SI_ORDER_ANY - 255) giving enough room for modules as you can see 


*/ 
#define DN _MODEV ORD (SI_ORDER_ANY - 128) /* after ipfw */ 
DECLARE MODULE (dummynet, dummynet_mod, DN SI SUB, DN MODEV ORD) ; 


MODULE DEPEND(dummynet, ipfw, 3, 3, 3); 


MODULE VERSION (dummynet, 3); 


Now, let's see the DUMMYNET's configuration part. First, copy the sockopt data from userland to 
kernel first. 


Then make the configuration in FreeBSD 7.x or FreeBSD 8.x format. For backward compatibility, the old 
FreeBSD 7 syntax is still supported (the new syntax is much less error prone/buggy, more consistent 


(pipe usage)). 


case IP DUMMYNET CONFIGURE: 
v= malloc(len, M ‘TEMP, M- WAITOR); 
error = sooptcopyin(sopt, v, len, len); 
if (error) 
break; 
error =-dn compak. ‘configure (vy) ; 


LRreoiy,. MA TEMP); 
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break; 


Stable. ame 
dn-compat configure (void. 17) 
{ 
StEUGE dn td *but. = NUbhe base; 


struct. dn sch -*sch =. NULL; 


struct. dn link *p = NULL; 


struct dn. fs “tts = "NULL; 


STEUCE dn PEOEr le pr NUE: 
int..lmax; 


int error; 


=> Those two struct represent FreeBSD 7 and 8 pipe configuration format 
SErUCE ‘dn “pipes: “py =< (struck dm pipes) vy 


Struct dn. pripes “pe = “(SEract dm pipes. sv; 


=> If we have a pipe to configure: 
1, = pie pLpe- nr; 
Lt (a, TSO) f. fF pipe contig +7 
=> We take chunks of the buffer: 
Scr =O NExE (Gbur y¢ “sizeor (sch). DN SCH) 7 
p= oO Next (abut, sizeof (*p), DN LINK) ; 


fs’ > O-next(cbut, Sizeor (*is), DN. Es)¢ 


error = dn compat config pipe (sch; p, fis, v)e 
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=> Here, we carry out traffic shaping configuration, bandwidth, delay and 
burst.s 


Stalic: ant 
dn compat «contig pape(Sstruct-dn sch *sch; sbruct dn dank -*p; 


SEEUCE Oi-ts “isy VOM" a) 


SErUCE: dn pipes 2p 7S (Struct -dn. pipes =) ve 
Struct. dn pipes “ps. = (Struct.-dn pipes *) vz 


Int & =" pl=Splipe: nry 


Ssch-Ssched, nr =: a7 
sch->oid.subtype = 0; 
pela nk Wes 7 

£S=FiS eS 2 DN MAX EDs 


eso sched: ne i+ DN MAX ID; 


f* Common. (26...) and-6: 7 
p->bandwidth = p7->bandwidth; 
p->delay = p/->delay; 
Sth same gee sats a 

/* FreeBSD 8 has burst */ 


p->burst = p8->burst; 


fi Ene: Eire. Elowset #7 


dn compat config queue(fs; v); 
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ESS 2hs = i+ 2*DN MAX ID; 


tsa-Ssched mr = i+ DN MAX ID; 


/* Move scheduler related parameter from fs to sch */ 


sch->buckéts = fs->buckets; /*XKX*/ 

£Ss=7 buckets = :.07 

if (fs->flags & DN HAVE MASK) { 
schert lags: |) DN UHAVE, MASK; 
fs->flags &= ~DN HAVE MASK; 
sch->sched mask = fs->flow_mask; 


bzero(&fs->flow_ mask, sizeof(struct ipfw_ flow _id)); 


return 0; 


=> Once the bandwidth is set (and eventually the extra burst allowance), 
these settings are used here. 


at. (Si=ridle- Time “dn Cro .curr “Cime)'.'4, 
/* Do this only on the first packet on an idle pipe */ 


struct dn dank “p= <fs=>sched=>link; 


Sa Sched: ime: = dn -cioaccurr -cime; 
si->credit = dn_cfg.io fast ? p->bandwidth : 0; 
if (p=>burst). 


UN t64 “ee burst =. (dn cig .cure time. =. SsisSidle time) -* p=>bandwidath; 
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Lt pours (= Spurs) 
burst = p->burst; 


si->credit += burst; 


} 


=> Here, the delaying / bandwidth limit policies will be applied before 
giving back the upper hand 


to ipfw through dummynet_ send. 


m= serve: sched (NULL, 81,.:dn.cig.curr time); 


SraAtiLG: VO10 
dummynet send(struct-mburf *m) 
{ 


struct mbuf *n; 


for (; m != NULL; m =n) { 
struck afnet. *ttip = iINULLe./* gee 3.456. Gomolains */ 
struct m_tag *tag; 


hike: Cree 


=> IPFW actions 
Switch (dst) { 


Gase: DIR OUT: 


ip output(m, NULL, NULL, IP FORWARDING, NULL, NULL); 


break ; 
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case DIR_IN 
netisr dispatch(NETISR IP, m)+? 


break; 


#ifdef INET6 
case DIR_IN | PROTO IPV6: 
hetisr dispatch(NETISR IPV6, m); 


break; 


Case DLR OUT i PROTO" LP 6s 


ip6 output (m, NULL, NULL, IPV6 FORWARDING, NULL, NULL, 
NULL) ; 


break; 


#endif 


case DIR_FWD | PROTO IFB: /* DN TO IFB FWD: */ 
GE. oprdge- dm p. f=-iNUGL) 
((* bridge dn. p)itmy -fp)) 
else 


printf ("dummynet: if bridge not loaded\n"); 


break; 


case DIR_IN | PROTO LAYER2: /* DN TO ETH DEMUX: */ 


/* 


* The Ethernet code assumes the Ethernet header is 


* contiguous in the first mbuf header. 
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* Ensure this is true. 

ef 

if (m->m_len < ETHER HDR LEN && 

(m = m_pullup(m, ETHER _HDR_LEN)) == NULL) { 
printf ("dummynet/ether: pullup failed, " 
"dropping packet \n") +; 


break; 


ether demux (m-sm pkthdr.revit, m) ; 


break; 


Gasé DIR OUT.| PROTO LAYER2: /* NTO ETH OUT: */ 
ether output _frame(ifp, m); 


break; 


Gase DIK DROP: 
/* drop the packet after some time */ 
FREE PKT (m) ; 


break; 


default: 
printe ("dummynet: bad switch td! \n", dst); 
FREE PKI (m):7 


break; 
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2/ IPFW firewall rule route study 


Now, let’s see how a firewall rule is processed in the kernel. 


First, the rules parser sockopt copies from userland to kernel, and checks the rule’s validity. 


ni oe 

ipiw cel (Struct Sockoptl *sopt) 

{ 

FOeT ING «RULE MAXSIZE (S12*saze0t.cr anes -t)) 
iit erro; 
SiZO. UU Si 26, Vals2ze; 
STrUCE <p Siw * Dut; 
struct. ip tw ruled. *rule; 
struct. ip fw chain. *chain; 
i Ine sZ t. eudenumed 2) 4 
UA SZ, te Opis 
SEnUcE Tule Cheek into: <7 


IPFW RLOCK TRACKER; 


chain &V_layer3 chain; 


error = Os 


/* Save original valsize before it is altered via sooptcopyin() */ 


valsize = SOpt->so0ptvalsize; 


Opt = SOPUt=FSOpE. Name; 
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case ITP PWOUADD: 
rule = malloc(RULE MAXSIZE, M TEMP, M WAITOK); 


=> Here, we must check if it is FreeBSD 7.x rule format ; sopt_valsize 
field will give this hint after the rule is copied in the kernel. Then, it 
will be converted to FreeBSD 8.x format. 


error = sooptcopyin(sopt, rule, RULE MAXSIZE, 


SLZeCOL(StLUCE 2p fw) ob? 


if (error == 0) 
error = check_ipfw_ruleO(rule, size, &ci); 
if (error == 0) { 
/* locking is done within add rule() */ 
STRUCE spt tw. *keuley 
krule = ipiw -alloc-ruletchain;, RULEKSLZBO( rats )) 3 


Ci,urule = (caddr t) rule; 


co .krube = krule; 
import. ruleQ (&ci); 
error = commit. rules (chain, &ci, 1); 


=> If the userland requested an answer, it is converted back to FreeBSD 7.x 
format when necessary. 


Lf. ( {error 6 .sopt=>S0pt dir == SOPT GET)) 4 
sis aan o> | 
error -=.converl fule bo. 1 (rule); 


size = RULESIZE7 (rule); 
Lt (error). 4 
free (rule; MM TEMP); 


return error; 
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=> Sending back the rule data to the userland: 


error = sooptcopyout(sopt, rule, size); 


Afterwards, IPFW has main firewall rules to check where it is decided if a packet ought to be accepted, 
dropped, passed to dummynet module, and so on. 


at 
ipiw chk (struct. ap- fw args *args)) 


{ 


/* Identify IP packets and fill up variables. */ 
=> Similar checking is done on ipv4. 


if. (pkt len: -= -sizecr (Struct: ipe hdr): && 


(acs =-eh. SS NUE. «|||\\ Skype. == BTAP REY PR BPVG). & p= Pip rss oO). a 
Struct ape hdr *1p6 = (struct 2p6-- hdr *)cip; 
Ss 1pyo <= 15 


ArgS=7 5h 0. addi type = G7 
bien = SiLzZeor (SEruce 1p6 hdr); 


PELOLO: =" ApS=-1p 6: Mb; 


/* Search extension headers to find upper layer protocols */ 
while (ulp == NULL && offset == 0) { 


Switch (proto) { 
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Case IPPROTO.1CMPV6: 
PULLUP TO(hléen,; ulp,- Struct. Lemp6 tidr); 
Lempo: type = ICMP6.(ulp)->icmp6. type; 


break; 


Case? TPPROTO cher: 
PULLUP TO(hlen, ulp, struct tephdr); 


dst <=porl> = 2CR (als) =>Ch dpert; 


sre. port = TCP (ulp)=>th. sport; 
/* save flags for dynamic rules */ 
arge-2f 1ds flags «= TCP (lp) seth tlags; 


break; 


case IPPROTO: Serle: 
PULLUP. TO hlen, ulp, struct sclphdr)'; 


SEG POEL. = “SCLP(ulp) e>oSre pore; 


dst “port. = 7501 P(ulp)--2dest pore, 


break; 


Case IPPROTO UDP: 
PULLUP TO(hlen,. ulp, struct udphdr):; 


dstporl. = UDP(ulp) = ruly-dporl; 


Sre port. = UDP (alps pul “sport, 


break; 
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=> Here comes an important part of your future custom module building, the 
check of the known rules where each packet is inspected from the following 
loop: 


POE Ay “Epos: 4 Cheim=2ir -TuUles; iF postr), 4 
ipfw_insn *cmd; 
Minted t Cablearg = 07 
int 1,. emdlen,. Skip or; /* -skip rest: of OR block */ 


Struce 1p iw. ei 


f= chai n=smap lt pos; 
ai Ur Sel Ot Sable 1 Cl oS ta eset) 


continue; 


for: "(l= <f=>cmd len, cmd: = £=>emd: gl 20g 
1 -= cmdlen, cmd += cmdlen) { 


int match; 


/* 

* check body is a jump ‘target. used when. we. find a 
* CHECK STATE, and: need. to. jump. to. the body ‘of 

* the target rule. 


i 


/* check body: */ 
cmdlen = F_ LEN (cmd) ; 


ix 
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= Ptr OR, block: Vamsr. i), soci | sensi sm) Wes: teins 

sf OR bit set. am-all. but the Taste instruction. 
A TAS First imateh wil Seu. “Skip or"; and cause 
* the following instructions to be skipped until 


= past. the Oone- with ‘the FOR bat. clear, 


*/ 
if. (Skip or) Cf j* Skips Lares instruction */7 
Le ‘((ema=> len GP OR) == 0) 
skip or = 0; /* next one is good */ 
continue; 
} 
match = 0; /* set to 1 if we succeed */ 


Switch (cmd->opcode) { 

a 

* The first set of opcodes compares the packet's 
* fields with some pattern, setting 'match' if a 


* match is found. At the end of the loop, there is 


* hOGLe “CO. deal -with-F NOP and FOR flags assocaated 


* with the opcode. 
aif 
Case O NOP; 
match = 1; 


break; 


case O FORWARD MAC; 


printf ("ipfw: opcode %*d unimplemented\n", 
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cmd->opcode) ; 


break; 


Case OGLD: 

Case 0 ULD: 

Case OJAI: 
/* 
* We only check offset == 0 && proto != 0, 
* as this ensures that we have a 


* packet with the ports info. 


ae, 
if (offset != 0) 
break; 
LE (proto. == TPPROTO TCE. ‘hi 
proto == TPPROFO UDP) 


match = check _uidgid ( 
(ipfiw_insn u32Z *) cmd, 


args, é&ucred lookup, 


#ifdef FreeBSD _ 


&ucred cache) ; 


#else 


(OL. “)CuCred cache)’ > 


#endif 


break; 


case O RECV: 


Mabchi= hace. maton (m=>m pkihdr, revit, 
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(iptw insn: it *)cmd, chain, ctablearg) ; 


break; 


Case 0 MAG TVPRs 
if (args->eh != NULL) { 
Ane e ap = 
((ipin:: insn- a1 6) cnid)—-ports; 


Bl 9 eae 


for (1 = cmdlen - 1; !match && 1i>0; 


match = (etype >= p[0] && 


etype <= pl[l]); 


break; 


Case © FRAG: 


match = (offset != 0); 
break; 
case O IN: iS MOE. as? SOc. aa ay 
match = (oif == NULL); 
break; 


case O LAYER2: 


match = (args->eh != NULL); 
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break; 


case O DIVERTED: 
{ 
/* For diverted packets, args->rule.info 
* contains the divert port (in host format) 


* reason and direction. 


ai 
WEES 2 tb. 1S args=>rule..1 nto; 
match = (i&IPFW_ IS MASK) == IPFW_IS DIVERT && 


cmd->argl & ((i & IPFW_INFO IN) ? 1: 2); 


break; 


Gase © PROTO: 
/* 
* We do not allow an arg of O, so the 
* scheck, or: “prolo son ly sutlstces: 
ua 
match = (proto == cmd->argl); 


break; 


Case 0 TP Ske: 
match = is ipv4 && 
(((ipfw_insn ip *)cmd)->addr.s addr == 
ero ip.s- addr); 


break; 
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Case OTP eRe. LOOKUP: 


Case QO TP. DST LOOKUP: 
Lf (18 apv4a) 4 
Uanis 2 te. key 
(cmd=>opcode == 0 «EP DST -hOOKUP). 2 


ds ip. 


n 


addr. % snc: 1p.s addr; 


Uint32 -C VSO; 


af (cmd lew EF INSN: SiZE (1 piw i nism asZ)))) 4 
/* generic lookup. The key must be 

* am S2b ho bag=endian sormat. 

*y 


v= Chipiw ins: us2 *)cma) =sdl by 


key =-dst.ip.s addr; 
else if (v == 1) 

kKey-S=cSrC: 1p.s: sddr; 
else if (v == 6) /* dscp */ 


key > (ips Pip toe So 2) 6 Ox StF 


else if (offset != 0) 
break; 

else Lt (proto-.-l= TPPROLTO “TCP “G6 
Proro: :! = TPPROLO. UDP) 
break; 


else if (v == 2) 


key =<dsu port; 
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else if (v == 3) 
key = src port; 
#ifndef USERSPACE 
else if (v == 4 || v == 5) { 
check _uidgid ( 
(ipfiw insn u3z *) cmd, 
args, &ucred lookup, 
#ipdek - FREEBSD: _ 
&ucred cache) ; 
if (v == 4 /* O UID */) 
key <= Uered “cacne-— er uid; 
else if (v == 5 /* O JAIL */) 


Key = WCred -<Cache-Per pra sons2pr ids 


#telse /* ! FreeBSD */ 

(void *) &ucred_ cache) ; 

if (v ==4 /* O UID */) 

keyoS Wered: cached; 

else if (v == 5 /* O JAIL */) 

key. = UCGred: Cache. xid> 
#endif /* ! FreeBSD */ 

P else 


#endif /* !USERSPACE */ 


break; 


match = ipfw_ lookup table(chain, 
cmd->argl, key, &v); 


if ('match) 
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break; 


if (cmdléen == FP INSN SiZk (ipiw insn u32):) 
match = 

CULpiw ism m32 )emdyood (Ol) == 
else 


tablearg = v; 
Pee Lee: 2h. Cus py 6): of 
WIinte2. t. VS OF 
vold. *pkey = <cmd=Sopeode ==: 0 LP DST HOOKUP) 2 


Sabgs-> i 1d.dst aApoe Gargs-Sf ad.Sre 1 p6; 


match = ipfw_ lookup table extended(chain, 
emda=—Sarql, 


Sizeor (Struct. ine addr), 


pkey, &V); 
if. (cmdlen == F-INSN SiZ4bGipiw-insn 132).) 
Mabeh = (ipiw ansn,.u32 =) cmd)=>d[ 0) =S"y7 


if (match) 


tablearg = v; 


break; 


Case. 0 TP SkC MASK: 


Case (OTP. DST MASK: 
if (is _ipv4) { 
Win 2: a= 


(emd=>opcode =="0.- LP DST, MASK): 2 
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dSt-pasvaddr 2 src, ipss-addr; 
WintsZ c.*p. =" C(ipiw ansn usZ2 *)-emd)a>d; 


int i = cmdlen-l1; 


for (— Imatch 64 a>0;7 == 2, pt=<2) 


match = (p[0] == (a & p[1])); 


break; 


Case OTP SRC ME: 
LE Visi pyvay."{ 


struct ifnet *tif; 


INADDR. “TO IFP (sre ip; nae a 
match = (tif != NULL); 


break; 


#ifdef INET6 


/* FALLTHROUGH */ 


case O IP6 SRC ME: 


match= Vs ipv6: €& search ap6 addr net(sargs=>f. 1d.sre i1p6); 


#endif 


break; 


Case QO 1P DST SEI: 


case, © TP SRC SET 


if (is_ipv4) { 
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GAMES 2 CS da = OP AMES2 se"). (emeaal) > 
U Smt 32) it. addr = 
cmd= opcode == On LP DSi: Sir 
argsH> iy Tdsds't: ip 


abgs >i. a eere: apy 


ie Cada ao) 

break; 
addr == d/0)]7/* subtract:.base */ 
match = (addr < cmd->argl) && 
(“a d-- (addrs> 5)" <6 


(1<<(addr & Oxlf)) ); 


break; 


Gase-O: IP DST: 
match = is ipv4 && 
(((ipfw insn ip *)cmd)->addr.s addr == 
dst. ip.s: addr); 


break; 


Case © 2P Dol ME 
if (is _ipv4) { 


struct ifnet *tif; 


INADDR TO IFP(dst. ip; tit); 


match = (tif != NULL); 
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break; 


#ifdef INET6 


/* FALLTHROUGH */ 


case O IP6 DST ME: 


match= iS ipvo && search ipo addr net (sargs->f id.dst ipo); 


#endif 


break; 


CasewQ TP” SROPORT: 


Case OulE DSTPORL: 
/* 
* offset == 0 && proto != 0 is enough 
* to guarantee that we have a 
* packet with port info. 
na 
if ((proto==IPPROTO UDP || proto==IPPROTO TCP) 
&& offset == 0) { 
We sleG. Es se 
(emd=>opcode: =="O TP SRCPORT): @ 
SEC POEL - 3 (dst pOrk: | 
AD Urls cig uel eae res Oe 
CRD iwi iene 16.’ Cm pori ss 


Tn Aas 


for (1 = cmdlen - 1; !match && 1i>0; 
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match = (x>=p[0] && x<=p[1]); 


break; 


Case 0. LCMP IZED: 
Mateh => (ekiset => .0. 466° provo==LPPROTO: LCMP& 
LCMpLCyYpS: MalLch(TCMP(ulp),((ipiw: ins usa *)}:cmad) 7 


break; 


Case. 0 LOG: 
ipfw log(chain, £, hlen, args, m, 


O1f, offset |) 1pet mf, cablearg;.. 1p)y 


Case, QO ANTTSPOOF: 
/* Outgoing packets automatically pass/match */ 
1f (oif == NULL && hlen > 0 && 
( (is ipv4 && in localaddr(srce_ ip)) 
#ifdef INET6 
L', “is pyre 266 
in6 Localaddr(é(args=2h tdssre. 1p6)))) 


#endif 
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#ifdef INET6 
1S. a pvV6-.?- Verily: patho ( 
Slarqs=> iol. Sree), 
m-Sm pkthdr.rcvif, 
args=>T.1d.£1b) 
#endif 
Verity path (SPC 1p; 
m= PREnOE.Eevat, 


args sr 1068 ie) y 


Case 0 IPSEC: 
#ifdef IPSEC 
match = (m_tag find(m, 
PACKET TAG IPSEC_IN DONE, NULL) 
#endif 
/* otherwise no match */ 


break; 


#ifdef INET6 
Gass O.DP6' SRE: 


match = is ipv6 && 


'= NULL); 


IN6 ARE ADDR EQUAL(&args->f id.src ip6, 


72 


&((ipfw_insn ipo *)cmd)->addré6é) ; 


break; 


Case 0 IPG. DSi. 
match = 1S pve && 
IN6 ARE ADDR EQUAL (&args->f_id.dst_ip6, 
&((ipiw, insn, 1p6 *)-emd)->addre)’; 
break; 


case O IP6 SRC MASK: 


case O IP6 DST MASK: 


Lf Cis ape)" 4 
Int: y= cemdlen = 1; 
SuTrUCE “im 6. addr sp; 
SErUGE: 16 addr 207 = 


&((ipfw_insn ip6 *) cmd) ->addr6; 


for (4: Smatch: 46> a8 SO d= 2 


i =F INSN-clL4k(Senuce. i6-addr) 
ie 
p = (cmd->opcode == 


O IP6 SRC MASK) ? 


aegs-Fb Td. she pet 
args=2T 1d. dst, p67 

APPLY MASK(é&p, &d[1]); 

match = 
IN6 ARE ADDR EQUAL (&d[0], 


&P) ; 
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break; 


case O FLOWOID: 
match = is ipvé && 
flowoid mateh(args=>f 1d.fhow 1d6, 
(ipiw insn u32 *) cmd); 


break; 


case O EXT HDR: 
Match = LSo2pvo-e& 
(ext hd & ((ipfw iinsn *) emd)->argl); 


break; 


Case. LPG? 
Maven. = 28: 1pVve; 
break; 


#endif 


Gase-O IPA: 
match = 2s a2pv4; 


break; 


Gases O- TAG? “{ 
SEFUuct. mM, Lag» *mtag; 


WANES 2-6 Cag = DARG (omed=Pargl, “tag; 
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/* Packet is already tagged with this tag? */ 


mtag = m tag locate(m, MTAG IPFW, tag, NULL); 


/* We have “untag' action when F NOT flag is 
* present. And we must remove this mtag from 
* mbuf and reset “match' to zero (°‘match' will 
* be inversed later). 

* Otherwise, we should allocate new mtag and 
* push 1 2Ante mbur. 
es 
Vt (emd->1lén.& FONOT) 4. /7*: “untag’ action. */ 
if (mtag != NULL) 
m tag delete(m, mtag); 
match = 0; 
} else { 
if (mtag == NULL) { 
Meag = m=-tag” alloc( MTAG LPEW, 
tag, 0, M NOWAIT); 
if (mtag != NULL) 


m tag prepend(m, mtag); 


= 
ie) 
ae 
Q 
i 
I 
Fe 
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Case OO. TAGGED: 4 
SLEUCT mM Lag *mLag; 


WUInt32. t. bag = TARGCemd-Sargl],. Tag) 


if (cmdlen == 1) { 
match = m tag locate(m, MTAG IPFW, 
tag, NULL) != NULL; 


break; 


/* we have ranges */ 
for Mmteg = mr tag Tirst im); 
mtag != NULL && !match; 
mtag = m tag next(m, mtag)) { 


UB 6 2. y 


aie. ae 

if. (mtag=>m tag cookie: t= MIlAG 1PEW) 
continue; 

p = -Clipiw ansn ule. *) end) ->ports; 

1 = cmdlen - 1; 

hort; Smetch ek as > OF. De=, “p= 2) 
match = 


mMbag->m-tag 1d >= pO) e& 
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mtag->m_ tag id <= p[1]; 


break; 


Case sO LIMIT: 


Case'O. ACCEPTS 


retval = 0; /* accept */ 

oes Oe /* exit. inner Loop */ 
done = 14 /* exit outer loop */ 
break; 


Case QO. PIER: 
Case 0 -QURUE: 
set match (args, £ pos, chain) ; 


TARG(cmd->argl, pipe); 


args->rule.info 
if Wemad= opcode. ==. 0) PEPE) 
args=-rulesinfo |= TPEW IS PilPE, 
if (V_fw_one pass) 
args-Srule.anto- |= 1 PEW ONEPASS; 


retval = IP FW DUMMYNET; 


ay Gs /* exit inner loop */ 
done = 1; /* exit outer loop */ 
break; 
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case O DIVERT: 
Case OQ: TEE: 
if (args->eh) /* not on layer 2 */ 
break; 


/* otherwise, this is terminal */ 


lk =O /* exit inner loop */ 
done = 1¢ /* exit outer loop */ 
retval = (~emd=>opcode =="0: DIVERT) <2 


IP FW DIVERT : IP FW TEE; 
Set Match (args, £ pos, chain); 
args->rule.info = TARG(cmd->argl, divert); 


break; 


Case.-O_ REJECT: 
/* 
* Drop the packet and send a reject notice 
* if the packet is not ICMP (or is an ICMP 


* query), and it is not multicast/broadcast. 


ay 
af (hlen > O-&e 1s apv4. 4& offset == 0: €& 
(proto != TPPROTO TEMP. .|(:] 
is icmp query(ICMP(ulp))) && 


!(m->m_ flags & (M_BCAST|M MCAST)) && 
LIN MULTICAST (neohi (dst 1p vs -addr)) \ «4 


send: reyect(args,. cmd-pargl, 1pleny, 1p) 4 
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#ifdef INET6 


m = args->m; 


/* FALLTHROUGH */ 


case OQ. DENY: 


retval = IP FW DENY; 


i, SO /* exit inner loop */ 
done = 1 /* exit outer loop */ 
break; 


CasevO FORWARD IP: 


if (args->eh) /* not valid on layer2 pkts */ 
break; 

dee, GSS NOs ||) eee Sey st || 
dyn dir == MATCH FORWARD) { 


Struct sockaddr in “say 


Sa. = €(¢((ipiw insn sa *) cmd) ->sa); 
if (sa->sin addr.s addr == INADDR. ANY) 
/x 


* We use O FORWARD IP opcode for 
* fwd rule with tablearg, but tables 


* now support IPv6 addresses. And 


* when we are inspecting IPv6 packet, 
* we can use nh6o field from 


& “table value as Text. hop address’. 
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fs 
LETS pve) 4 


StTrUCE sockaddr: in6 “sao; 


sao = args->next_ hopo = 
&args->hopstore6; 
sa6->sinoe family = AF INET6; 
sac-reine Ten = sazeor (sao); 
Sa6o->sine addr = TARG. VAL ( 
chain, tablearg, nh6); 
/* 
* Wet, Sine SCOpe- Le omy Los 
* link-local unicast addresses. 
2, 
if (IN6 IS ADDR _LINKLOCAL ( 
&sa6->sino addr) ) 
Sao-rSine. scope “1d.= 
TARG VAL (chain, 
tablearg, 
zoneld) ; 
} else 


#endif 


Sa. =rargs--nest hop = 
&args->hopstore; 
Sa->sin, family = AF INET; 


sa-> sim. len = sa7eor (Sa) 7 
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sa--sin addras- addr = hereon ( 


TARG VAL(chain, tablearg, 


nh4)); 


} else { 


args- 


} 


retval = 


Case (O. NAT: 


done = 1; 


Ppnext. Op’ = sd; 


IP FW PASS; 
/*® OR tne Loops 7 


/* exit..cuter loop: */ 


/* exit Anner Loop: +7 


/* exit outer loop */ 


if (!IPFW_NAT LOADED) { 


retval = IP FW DENY; 


break; 


SULUCE “CEG ‘Nak Ly. 


int nat id; 


set match(args, £ pos, chain); 
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fe Cece At thts 


((ipiw ans nat *)cmd)—snat<= 


if (cmd->argl == 
retval = 
break; 
} 
t= 
if (t == NULL) { 
hat. a.¢) = 
t= 
if (t == 
retval = 
break; 
} 
if (cmd->argl 
} 
retval = 
break; 


=> 


TARG (cmd->argl, 


ipfw_ nat ptr(args, 


8° “Gg lebau.* 


pei 


NULL) { 


IP FW DENY; 


t, 


We could add additional new type of rules. 


previous modules, new opcode are needed. 


default: 


panic("-- unknown opcode %d\n", 


} /* end of switch() 


on opcodes */ 
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nat rule */ 


ipfw nat ptr(args, NULL, m); 


((ipfw insn nat *)cmd) ->nat; 


nat); 


(*lookup. nat ptr) (échain=>nat;, mat id)? 


!= IP FW TARG) 


ie 


dn 


as we've seen in the 


cmd->opcode) ; 


Final exercise 


Now is the time to use all the acquired knowledge you learned through all the modules. 


The goal is to add firewall policy (for example, based on IP / country codes, adding the detection 
algorithm in the kernel side then the rule config from userland perspective), updating the ipfw command 
line accordingly and above all, the kernel side (preferably as a distinct ipfw kernel module). 
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David Carlier 


